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ABSTRACT 

Estimation  of  systematic  risk  is  one  of  the  most  important  aspects 
of  investment  analysis,  and  has  attracted  the  attention  of  many  re- 
searchers.  In  spite  of  substantial  contributions  in  the  recent  past, 
there  still  remains  room  for  improvement  in  the  methodologies  cur- 
rently available  for  forecasting  systematic  risk.   This  paper  is  con- 
cerned with  some  improved  methods  of  estimating  systematic  risk  for 
individual  securities.   We  use  Bayesian  analysis  with  hierarchical  and 
non-normal  priors. 


1.   Introduction 

The  central  model  in  most  of  the  research  pertaining  to  systematic 
risk  has  been  the  single  index  model 


R.   =a.  +6.R   +£.       i  =  1,2,... ,N  (1) 

it     l     irat     it  _ 


where  R.   and  R   are,  respectively,  the  random  return  on  security  i 
it      mt  ' 

and  the  corresponding  random  market  return  in  period  t,  a.  and  8.  are 

the  regression  parameters  appropriate  to  security  i  and  e    is  the 

2 
random  disturbance  terra  with  mean  zero  and  variance  a..   The  parameter 

0.,  called  beta,  measures  the  systematic  risk  of  the  security  i  and  is 

defined  as  Cov(R. ,R  )/Var(R  ). 

l   m       m 

Estimation  of  this  systematic  risk  is  one  of  the  most  important 
aspects  of  investment  analysis  and  has  attracted  the  attention  of  many 
researchers.   Betas  are  used  by  the  investors  to  evaluate  the  relative 
risk  of  different  portfolios.   In  the  future  market  context,  betas  of 
different  stock  portfolios  are  needed  to  calculate  the  number  of  con- 
tracts to  be  bought  or  sold.   In  spite  of  substantial  contribution  in 
the  recent  past,  there  still  remains  room  for  improvement  in  the 
methodologies  currently  available  to  forecast  betas. 

Blurae  (1971)  observed  that  over  time  betas  appear  to  take  less 
extreme  values  and  exhibit  a  tendency  towards  the  market  risk.   This 
would  mean  that  the  historical  betas  based  on  ordinary  least  squares 
(OLS)  estimation  would  be  poor  estimators  of  the  future  betas.   There- 
fore, it  is  necessary  to  adjust  the  OLS  estimators  of  Q  . .      Vasicek 

i 

(1973)  suggested  a  Bayesian  adjustment  technique  using  a  normal  prior 
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for  B..   As  it  will  be  clear  from  the  subsequent  discussion,  Vasicek's 
procedure  has  some  drawbacks.   It  utilizes  the  information  from  the 
other  stocks  only  through  the  cross-sectional  mean  and  variance.   In 
the  New  York  Stock  Exchange  more  than  2,000  stocks  are  traded; 
improved  estimates  of  one  stock  might  be  obtained  by  combining  the 
data  from  the  other  stocks  as  far  as  possible.   We  propose  to  do  this 
utilizing  Lindley  and  Smith's  (1972)  hyperparameter  model  and  the 
concept  of  exchangeable  priors.   Under  this  framework,  parameters  of 
our  linear  model  (1),  themselves  will  have  a  general  linear  structure 
in  terms  of  other  quantities  which  are  called  hyperparameters.   And 
exchangeability  means  that  the  joint  distribution  of  B.'s  is  unaltered 
by  any  permutation  of  the  suffixes.   This  assumption  is  weaker  than 
the  traditional  independent  and  identically  distributed  (IID)  set  up. 
Lindley  and  Smith's  linear  hierarchical  model  has  been  used  in  many 
econometric  applications,  see  e.g.,  Trivedi  (1980),  Haitovsky  (1986), 
Ilmakunnas  (1986),  and  Kadiyala  and  Oberhelman  (1986).   In  Section  2, 
we  set  up  the  model  in  a  convenient  form  and  carry  out  the  Bayesian 
analysis  using  the  hierarchical  model. 

Recently,  Bera  and  Kannan  (1986)  studied  extensively  the  empirical 
distribution  of  betas.   They  considered  the  time  period  from  July  1948 
through  June  1983,  and  divided  that  period  into  seven  non-overlapping 
estimation  periods  of  60  months  each.   They  found  that  the  empirical 
distributions  of  betas  were  highly  positively  skewed  and  often 
platykurtic.   However,  with  a  square-root  transformation  the  values 
of  skewness  and  kurtosis  changed  in  such  a  way  that  using  Jarque  and 
Bera  (1987)  test  statistic  the  normality  hypothesis  could  be  accepted 
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in  four  out  of  seven  periods.   Also  the  values  of  the  test  statistic 
in  the  remaining  three  periods  were  not  very  high.   Therefore,  it 
appears  that  beta  has  a  root-normal  distribution,  i.e.,  the  square- 
root  of  the  variable  is  normal.   The  finding  casts  some  doubts  on  the 
validity  of  Vasicek's  selection  of  normal  priors.   Therefore,  our 

second  aim  is  to  do  a  Bayesian  analysis  assuming  that  /B .  is  normally 

2 
distributed,  or  in  other  words,  B.  has  a  noncentral  x   distribution 

with  one  degree  of  freedom.   In  Section  3,  we  generalize  our  results 
of  Section  2  by  considering  a  hierarchical  model  with  non-normal 
priors;  while  in  Section  4,  we  go  back  to  Vasicek's  set  up  and  com- 
bine it  with  our  non-central  chi-square  prior  distribution  and  its 
various  approximations.   In  the  last  section  of  the  paper,  some  con- 
cluding remarks  are  offered. 

After  the  publication  of  Vasicek's  paper  in  1973,  to  our  knowl- 
edge, there  is  no  work  which  attempts  to  improve  upon  it.   Also  in 
the  statistics  literature,  most  of  the  Bayesian  regression  analysis 
are  based  on  normal  priors  primarily  because  of  its  simplicity.   Since 
here  we  have  some  empirical  evidence  on  the  distribution  of  betas,  it 
is  appropriate  that  we  utilize  that  information  in  the  analysis.   We 
hope  this  will  lead  to  improved  estimation  of  betas. 

2.   Analysis  with  Hierarchical  Priors 

Since  we  are  interested  only  in  the  B.  parameters,  it  is  con- 
venient to  work  with  the  model  in  deviation  form 


y.   =  B .x.   +  u.  (2) 

'it     i  it     it 


-4- 


where  y.   =  R.   -  R. ,  x.   =  R    -  R  and  u.   =  e.   -  e..   So  we  will 
yit     it     1    it     mt     tn       it     it     1 

2  2         12 

have  u.   ~  N(0,p.I„)  where  p.  =  (1  -  —  )a . .   Under  the  classical  frame- 
it      '  i  T         1        T  i 

work,  the  maximum  likelihood  or  the  OLS  estimator  of  8 .  is  given  by 


£   x.  v . 
it' it 

3.  =  ~* *■   =  1,2,. ..,N 

I      x2 
t-1   " 

T 
2      2     2 

with  V(8.)  =  p./  £  x.  =  S..   Vasicek  (1973)  suggested  a  Bayesian 
l     i   .   it    i 

approach  with  normal  prior  for  8. 

3.  ~  N(8~,<j>2)  (3) 

and  a  non-informative  prior  density  for  p 

*(p.)a  I  . 
The  standard  Bayes  estimator  is  the  posterior  mean 


(6./4»2)  +  (6./S2) 
(l/*2)  +  (1/S2) 


(A) 


2       ,2-1 
and  the  posterior  variance  is  given  by  ((l/4>  )  +  (1/S.))   .   Vasicek 

suggested  to  use  the  mean  and  variance  of  the  cross-sectional  betas 

—      2 
in  place  of  8  and  $   respectively.   Empirical  results  in  Bera  and 

Kannan  (1986,  Tables  VII  and  VIII)  show  that  forecasts  based  on  the 

Vasicek's  adjusted  betas  are  superior  to  the  OLS  (unadjusted)  betas. 

This  indicates  that  we  can  improve  prediction  performance  for  a 


-5- 

security  by  pooling  information  from  other  securities.   Let  us  now 
cast  the  model  in  Lindley  and  Smith's  hierarchical  framework. 
From  (2)  we  can  write 

y. |8.  -  N(x.B.,  p2IT)       i  =  1,2,... ,N  (5) 

y  l  '  1       1  l    i  T 


where 


y.  =  Cyil,yi2,...,yiT)'  and  XjL  =  (xn,x.2,...  ,x.T)'. 


Next  we  assume  exchangeability  among  the  8.,  specifically 

Bi|5  ~  N(^,t2)  (6) 

with  a  second  stage  non-informative  prior  for  £.   Due  to  the  random- 
ness of  £,  this  is  a  weaker  assumption  than  the  IID  assumption  in  (3). 
To  see  it  clearly,  note  that  the  joint  prior  distribution  of 
6  x  ,62, . . .  ,3N  is  given  by 

N 

tt(8,,B0,...,6xt)  =  /  n  ir(g,  |Of(Od£ 
12      N      i=1 

where  f(£)  is  the  probability  density  function  of  Z, .      Therefore, 

tt  (8  ,  ,80 , . . .  ,6  )  is  a  mixture  of  IID  distributions  conditional  on  £, 

12      N 

but  unconditionally  the  joint  distribution  does  not  satisfy  the  IID 
assumption.   The  above  specification  is  a  simple  special  case  of 
Lindley  and  Smith's  (1972,  p.  6)  general  hierarchical  model 


y|el  ~  n(a18i,ci) 


Je2  ~  n(a282,c2) 


8j63  ~  N(A393,C3) 
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with  C_   ■  0.   For  this  model  Bayesian  inference  can  be  drawn  from  the 
posterior  for  8   given  {A.},  {C.},  and  y  which  is  given  by  N(Dd,D) 
where 

<  -1 
and      d  =  AC   y. 

Identifying  the  appropriate  values  of  A  ,  C.  and  8.,  for  our 
specification  (5)  and  (6),  we  obtain  the  following  posterior  distri- 
bution for  8   =  (6  ,B2>. . . ,B  ) '  as  N(Dd,D)  where 

t  t 

n"1     A-         /X1X1     1  ^^     1  ,     JN  ,7, 

D   =  diag( — —  +  ~   '  ■  •  •  »  — ~       ~~2^  ~  — 2  '  ' 

p       T  p       T        NT 

I  N 

where  J   is  an  N*N  matrix  whose  all  elements  are  one,  and 

N 


xiyi      Vn 

d  =  ( — r—  ,  ...,  — 2~  (^ 

Pl  PN 

Using  the  above  expression  for  the  i-th  security,  the  estimate  of  the 
systematic  risk  under  a  quadratic  loss  function  can  be  expressed  as 


si  =  (-y-  +  —)    — r  6i +  (~y- +  t }    t**  wjV 

P.  T  P.  p.  T  TJ=1JJ 

111 


N  X. X.  .  X .X . 

where  v.    »    [   I         _      1    L =•]  _    J    3 T  .  (9) 

,       Z    i  Z  Z    «  Z 

J  i  =  1t    x.x.    +p.  t    x.x.    +P. 

i-   i  l  j    j  j 


-7- 

A  quick  comparison  of  (4)  and  (9)  reveals  that  both  estimates  are 

linear  combinations  of  OLS  estimator  and  the  mean  of  cross-section 

beta,  but  for  the  hierarchical  estimate  a  weighted  average  of  the 

cross-section  betas  are  used  instead  of  simple  average  as  in  the 

Vasicek  case.   8 .  can  also  be  expressed  as 
l 

22         : 

P  .T  8  .         -J* 


i    2,  t   .     2  v  2.  .      2 

T  (x.X.  )  +  p  .   p  .  /X.X.     T- 

i  i     l    ill 

N 

■JL  JL 

where  8=2   8./N.   This  formula,  which  is  directly  comparable  with 
J-l   J 

(4),  reveals  how  the  information  from  other  securities  is  used  in 
estimating  the  systematic  risk  for  i-th  security.   Unlike  in  (4),  the 
information  conveyed  by  other  stocks  which  is  reflected  in  8*  is 
incorporated  in  a  self-fulfilling  way,  in  the  sense  that  the  cross- 
sectional  average  beta  is  consistent  with  the  estimates  for  individual 
securities. 

To  compare  the  above  estimate  with  Vasicek's  one,  let  us  put  his 
specification  in  Lindley  and  Smith's  framework  as 

y.|B.  ~N(x.S.,  p*IN) 

8.|(f  ~  N(F,4>2).  (10) 

Here  the  first  stage  prior  is  "completely  specified"  by  the  cross- 
section  data.   The  counterparts  (7)  and  (9)  are  respectively, 


and 
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Dy  =  diag( — ;p~  +  ~ '  "**'  — 2~   ~~2^  ^ 

Pt    ♦         PN    * 


x. x.    ,    ,  x.  x.        x.  x.    ,    ,  ,     N  8  . 

P,    *      P,         P.    4>     4>    J  =  l 


Comparing  (9)  and  (12),  we  note  that  to  find  the  average  systematic 
risk  in  (12),  a  simple  average  is  used  whereas  under  the  Lindley  and 
Smith  framework,  we  use  a  weighted  average.   The  latter  is  more 
reasonable  since  the  precision  in  estimating  the  systematic  risks  of 
different  securities  are  different  from  each  other. 
It  is  also  interesting  to  note  that 

-1    -1    ,.  A  1        1     1  s    JN 

Dv   -D   =  diag(—  -—,...,—-—)  +  — 

4>      T  <j>      T       NT 

and  as  such  we  cannot  say  much  about  this  matrix.   However,  when  we 

2    2 
put  <j>   =  x  ,  i.e.,  the  first  stage  prior  variances  are  the  same  then 

Dv   -  D   is  positive  semi -definite.   In  other  words,  the  Vasicek's 

estimator  has  higher  precision.   This  result  is  not  at  all  surprising 

if  we  compare  the  prior  distributions.   Under  the  Vasicek  prior  at  the 

second  stage  V(8)  =  0  whereas  for  the  hyperparameter  model  we  assume 

second  stage  non-informative  prior,  i.e.,  V(£ )    =  0. 


3.   Analvsis  of  Hierarchical  Model  with  Non-normal  Prior 

■» 

Vasicek  used  a  normal  prior  for  the  cross-sectional  distribution 
of  the  beta  coefficients.   As  mentioned  earlier,  the  cross-sectional 
betas  are  not  normally  distributed,  and  recent  work  by  Bera  and  Kannan 
(1986)  indicates  that  their  distribution  tends  to  normal  after  a 
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square-root  transformation.   It  is  therefore  natural  to  explore  the 
consequences  of  assuming  a  root-normal  prior  for  beta,  i.e., 

/ff~  N(B,4>2). 

2  2 

Then  8/<+>   will  be  distributed  as  a  noncentral  x   with  one  degree  of 

"2  2  "  2   2 

freedom  and  noncentrality  parameter  (8/<j>)   denoted  by  x,(8  /<t>  )?   Tne 


p.d.f.  of  8  can  be  written  as 


1    "  2 
r(8+8)Z 


ir(B|8,<fr2)    =   (2ir)    1/2(8<j>2)    1/2   e  cosh(^-)    I,ft      ,(8)      (13) 

z    — z 
where  cosh(z)  =  (e  +  e   )/2  and  I,    N  is  an  indicator  function. 

To  avoid  cumbersome  results  and  to  make  the  comparison  with 
Vasicek's  analysis  clearer,  we  shall  focus  on  the  case  of  a  single 
security.   Therefore,  without  the  loss  of  generality  we  can  suppress 
the  index  i. 

We  shall  develop  a  hierarchical  framework  by  assuming  non- 
informative  priors  for  the  hyperparameters  8  and  <j>.   The  hierarchical 
model  is, 

y|8,P  ~  N(x8,p2IT) 

B  |8  ,4>  -  4>2x2u(B2/4>2) 

with  tt  (p  )   a   p 


and  tt  (8  ,<$> )  a   <J> 
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The  major  difference  between  this  set-up  and  Lindley  and  Smith's 
(1972)  is  that  now  the  prior  for  B  is  not  linear  in  the  hyperparameter 
S.   Because  of  this  feature  one  needs  the  explicit  consideration  of 
the  non-informative  priors  for  p  and  <j>. 

Direct  application  of  the  Bayes  formula  using  the  above  setup  and 
the  prior  density  in  (13)  yields  the  joint  posterior  p.d.f.  of  the 
unknown  parameters, 

ir(6,B,p,<|>  |y,x)  a  p~(T+1)  exp{-  -~  (y-xB  ) '  (y-xB  )} 

x   f1  8"l/2[exp{-  -^r{/S"  +  B)2}    +  exp{-^-(/B~  -  S)2}]         (14) 

2<J>  2$ 

for  B,    B,   P    and  <j>    in    (O,00). 

Noticing  that  (by  the  "normal  integral") 

/   exp(-  -±-  (•!"  +  B)2)dB 
—       2$ 

is  proportional  to  <J> ,  integration  of  the  posterior  density  in  (14) 

with  respect  to  B  gives 

tt(B,p,4>  |y,x)  a  p"(T+1)  exp{-  -^(y-xB  ) '  (y-xB  )}B  _1/Vl 

2V 

-(T+l)   /    1  \/t    n    /-S-6v2UQ-l/2.-l 
a  p      exp{ r-l(T-l)  +  (— — )  ])B    4> 

2p2         SB 

-1         2       *      *      -1 

where  6  =  (x'x)   x*y  and  s*  =  (y-xB ) ' (y-xB ) (x 'x)   /(T-l).   This  ex- 

p 

pression  clearly  implies  that 
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w(B,p|y,x)  «  p-(T+1)exp{--^-[(T-l)  +  (^-)2  ]}8"1/2. 

2pZ  SB 

Integrating  this  joint  density  with  respect  to  p  (performing  the 

2  "2 

change  of  variable,  z  =  a/2p  ,  a  =  (T-l)  +  [(B-B)/s;J  )  one  gets  the 

marginal  posterior  p.d.f.  of  8  as 

TT(B|y,x)  a  8_1/2{(T-1)  +  (^-)2}~T/2,    8  >  0.  (15) 

S8 

The  main  features  of  ir(8|y,x)  can  be  seen  in  Figure  1  (plot  for 

8  =  1 ,  s%    =  .5,  T  =  20).   It  is  apparent  that  the  posterior  density  is 

8 

slightly  skewed  to  the  left,  and  therefore,  the  posterior  mean  will  be 
smaller  than  the  least  squares  estimator  of  8. 

When  a  non-informative  or  normal  prior  for  8  is  assumed,  the 
marginal  posterior  of  8  belongs  to  the  t-family  and  the  posterior  mean 

is  equal  to  the  least  squares  estimator.   Now  the  posterior  p.d.f.  is 

-1/2 
like  a  t-density  with  an  additional  factor  (8     )  and  range  (O,00). 

Without  the  additional  term  the  posterior  mode  (as  well  as  the  mean) 

-1/2 
would  be  8,  the  least  squares  estimator.   The  term  3     moves  the 

posterior  mode  towards  zero,  as  follows  easily  from  the  fact  that 

d7r(8|y,x)/d8  <  0. 

It  can  be  shown  easily  that  the  posterior  mean  of  8  exists. 

Therefore,  there  exists  the  Bayes  estimator  under  a  quadratic  loss 

function.   Indeed,  noticing  that  for  T  >_  3  the  terra 

((T-l)  +  (^-)2}"T/2  (16) 

S8 

1/2 
in  (15)  is  smaller  than  one  and  that  8  >  8     for  8  >  1, 
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OO  00  * 

j      6Tr(0|y,x)d8  <  J   61/2dB  +  /   B{(T-1)  +  (^-)2}"T/2d8. 
0  0  0  S8 

Direct  integration  and  the  use  of  the  t-family  density  [see,  e.g., 

Rao  (1973,  p.  171)]  reveal  that  the  right-hand  side  is  equal  to 

S2  -2-^+1  -      — 

1  +  _L_  [(T_1}  +ilj   2  B.  (T-l)  2 

3  T  T_2  lu  l'  T  2J  2  _,1  T-K 
sg  B(T'T") 

where  B(»,«)  is  the  beta  function. 

The  posterior  mode  can  be  computed  explicitly  if  we  approximate 
the  t-kernel  by  the  normal,  i.e.,  if  one  writes  the  posterior  density 
in  (15)  as, 

fl-l/2    ,   (B-6)2, 
B     exp{ — } . 

2s^ 

For  large  data  sets  no  crucial  loss  is  incurred  by  using  this  approxi- 
mation.  It  is  easy  to  show  that  if  the  posterior  mode  exists,  it  is 
equal  to 


A  2    s^ 
2  +  i    k  2      * 

As  the  first  derivative  of  the  approximate  posterior  density  is 

~  2      2 
quadratic  the  mode  exists  if  8  /4  >  sff/2.   Otherwise,  the  posterior 

—  8 

2 
density  will  be  monotonically  decreasing,  resembling  a  x   distribution 

with  one  degree  of  freedom.   The  mode  of  the  above  approximation  to 

the  posterior  p.d.f.  provides  an  easy  way  to  compute  the  Bayes  estimator 

(under  a  0-1  loss  function)  and  shows  that  this  estimator  is  obtained 

by  shrinking  the  OLS  estimator  8  towards  zero. 
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In  the  following  section  we  shall  develop  a  non-hierarchical 
approach  which  may  be  easier  to  compete  and  is  more  in  the  line  of 

Vasicek's  work.   Three  cases  will  be  considered:   one  based  on  the 

2 
root-normal  prior  for  beta  and  the  other  two  based  on  central  x   and 

normal  approximations  to  the  prior  density  of  beta. 

4.   Non-hierarchical  Approach  with  Non-normal  Prior 

Instead  of  a  hierarchical  model,  we  now  adopt  a  single  stage 

root-normal  prior  for  8.   As  in  Vasicek,  the  hyperpararaeters  B  and  4> 

are  assumed  to  be  known,  although  in  applications  these  will  have  to 

be  estimated. 

The  model  can  now  be  written  as, 

y|S  -  N(x8,p2IT) 

6  -  *2'X(0(B2/4>2) 

tt(p  )  a   p 

The   conditional    (on  B    and  <■> )   joint    posterior   p.d.f.    of    (B  ,p )    is, 

tt(B,p  |y,x,8,<j>)  a   p"(T+1)    exp{-  -^-  (y-xB  ) '  (y-xB  )} 

2pZ 

x   6_1/2[exp{-  -We"  +  B)2}    +  exp{-  -^/B~  -  B)2}]. 
2((.  2<(> 

Integration  with   respect    to  p    yields, 

tt(8  |y,x,S,4>)  a   B~1/2[exp{-  -~{/ff  +  B)2}    +  exp{-  -~</B"  -  B)2} 

2*  2* 

*    {(T-l)    +    (^-)2}-T/2  (17) 

SB 

for   B    >   0. 
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It  is  apparent  from  this  expression,  and  confirmed  by  Figure  2 
that  conditional  marginal  p.d.f.  of  6  has  a  left  tail  similar  to  a 
X   density.   Figure  2  plots  the  posterior  p.d.f.  for  8  =  1.03,  <f>  =  «22 

(B  =  1 ,  s~  =  . 5,  T  =  20).   The  values  of  6  and  <J>  were  selected  from 

8 

the  findings  in  Bera  and  Kannan  (1986).   For  T  >_  3,  as  we  noted  in 
(16),  the  last  term  in  (17)  is  less  than  one,  and  therefore, 

OO  00  ~  2 

/  6tt(6  |y,x,B  ,<J))d6  <  J  8k(X2(82/<i>2)  )dB  =  (1  +  ^-)(2tt4>2)  1/2, 
0  0  4> 

2  2 

where  k(x,(*))  represents  the  kernel  of  a  non-central  x   density,  and 

the  last  equality  follows  from  the  fact  that  E(x,"(6))  =  v  +  6.   Hence 
the  posterior  mean  of  8  conditional  on  8  and  <J>  exists. 

The  conditional  p.d.f.  in  (17)  is  directly  comparable  with  expres- 
sion (14)  in  Vasicek  (1973).   The  major  difference  is  that  here  the 

2 
non-central  x   kernel  replaces  the  normal  kernel  found  in  Vasicek 's 

result.   The  "t-like"  terra  is  common  to  either  expression.   It  is  also 

worth  recalling  that  in  the  hierarchical  approach  the  two  exponential 

terms  were  integrated  out  of  the  p.d.f. 

The  comparison  with  Vasicek's  results  is  perhaps  clearer  if  one 

uses  a  normal  approximation  to  the  non-central  x~  prior  for  8.   The 

approximated  prior  is  [see  Johnson  and  Kotz  (1970,  p.  139)] 

,fl,           1      A    1      -1/2  ,       1,A    ,-r        V,2X 

tt(8)    =  7-B  exp{-  -(-  /8    -   B)    } 

/  27  2  *  2  * 

where   A  =    [  (  l+<5  )/  ( 1  +  26  )  [  /2  ,    B   =    [(  1  +  26  2+  26  )/( 1  +  25  )]  l  /2    and   6    =    (8/<t>)2 
is    the   non-centrality   parameter. 
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The  same  arguments  as  before  yield  the  conditional  posterior 
p.d.  f .  of  6, 

tt(8  |y,x,S,*)a6~1/2  exp{-  y(^  H    -  B)2} 

r/^  ,n    ,3-6.2,-1/2 
*   (T-l)  +  (■— )  ] 

s3 
This  expression  highlights  what  was  already  remarked  when  com- 
menting on  the  marginal  (unconditional)  p.d.f.  of  3:   from  a  practical 

-1/2 
standpoint  the  terra  3     appears  to  be  the  crucial  modifier  as  it 

implies  a  thicker  left  tail  than  the  one  obtained  using  a  normal 
prior.   Therefore  the  posterior  mean  will  be  closer  to  zero. 

Applied  work  with  the  above  posteriors  will  be  slightly  messy. 

2 
Simpler  results  can  be  obtained  by  using  a  central  x   approximation 

2 
to  the  prior  for  8.   A  central  x   approximation  to  our  prior  p.d.f.  is 

[see  Johnson  and  Kotz  (1970,  p.  139)], 


x2   ~  Xf 

where   c   =    (  1-mS  )-1(  1  +  26  ) ,    f   =    l+62(l  +  26)-1    and  5    =    (3/4>)2.      Therefore, 
the   prior    for  3    is    a  gamma   density  whose   kernel    is, 

S1  "'    e^/h    ,  9    >   0. 

2  2 

where  h  =  2c<t>  .   It  is  worth  noting  that,  even  though  2c4>   is,  for 

fixed  f,  a  scale  parameter,  f  is  not  a  location  parameter.   It  is 

therefore  difficult  to  develop  a  hierarchical  model  with  non- 

•   z:  2 

informative  second  stage  priors  based  upon  this  central  x   approxima- 
tion. 
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Using  the  same  model  as  before  and  integrating  p  out  one  gets  the 
conditional  posterior  density  of  B, 

1  _!   _  £  »    _  1 

ir(B|y,x,f,h)  a  B2    e  h  {(T-l)  +  (^)2}   2  ,  B  >  0. 

SB 

The  plot  in  Figure  3  illustrates  the  shape  of  this  density.   The 
values  of  f  and  h  are  those  implied  by  B  =  1.03  and  4>  =  .22  (c.f.. 

Figure  2),  and  again  8  =  1,  s~  =  .5  and  T  =  20. 

p 

It  can  easily  be  shown  that  the  posterior  mean  exists.   Indeed 
for  T  _>  3, 

/  8Tr(8|y,x,f,h)dB  <  J   82  e  h  dB  =  |(f/2+l)  h  2 
0  0 

where  the  last  equality  follows  from  the  gamma  p.d.f. 

It  is  apparent  that,  at  least  for  the  parameter  values  used,  the 

posterior  p.d.f.  is  uniraodal  and  almost  symmetric  though  slightly 

skewed  to  the  right.   The  least  squares  estimator  can  be  larger  or 

smaller  than  the  modal  value.   In  fact,  straightforward  algebra 

yields 

signal?*}  ~=    sign{h({-  1)  -8}. 

Therefore,  if  8  >  h(—  -  1)  it  follows  from  the  shape  of  the  p.d.f. 
that  the  Bayes  estimator  B*  under  a  quadratic  loss  will  satisfy 
S*  ^  B.   If  we  take  Bayes  estimates  as  an  improved  predictor  for 
beta,  then  the  above  observation  agrees  with  the  findings  of  earlier 
researchers  that  relatively  high  and  low  OLS  beta  estimates  tend  to 
overpredict  and  underpredict ,  respectively,  the  corresponding  betas 


-17- 

for  the  subsequent  time  period  [see,  e.g.,  BLume  (1971)  and  Klemkosky 
and  Martin  (1975) ]. 

5.   Concluding  Remarks 

We  have  presented  only  some  theoretical  results.   It  would  be 
interesting  to  apply  our  procedures  to  real  data,  and  see  whether  that 
leads  to  improved  forecasts  for  systematic  risk.   On  the  theoretical 

side,  some  other  prior  distribution  can  be  used  instead  of  a  non- 

2 
central  x   distribution.   One  possibility  is  to  use  a  mixture  of  two 

(or  a  few)  normal  distributions.   A  second  possibility  is  to  take  a 
normal  prior  for  8 .  with  mean  modelled  in  terms  of  a  regression  func- 
tion of  some  firm  specific  variables.   The  prior  variance  could  also 

be  defined  from  a  regression  model.   Lastly,  a  number  of  other 

2 
approximations  for  non-central  x   distribution  are  available.   For 

2  2 

example,  x}(&)    can  be  approximated  by  a  central  x,  with  1+v  degrees 

of  freedom  where  v  is  a  Poisson  random  variable  with  mean  6/2. 
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